model design
Token Is All You Price
We build a mechanism design framework where a platform designs GenAI models to screen users who obtain instrumental value from the generated conversation and privately differ in their preference for latency. We show that the revenue-optimal mechanism is simple: deploy a single aligned (user-optimal) model and use token cap as the only instrument to screen the user. The design decouples model training from pricing, is readily implemented with token metering, and mitigates misalignment pressures.
- Europe > Kosovo > District of Gjilan > Kamenica (0.04)
- North America > United States > New Jersey > Mercer County > Princeton (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
On What Depends the Robustness of Multi-source Models to Missing Data in Earth Observation?
Mena, Francisco, Arenas, Diego, Miranda, Miro, Dengel, Andreas
Francisco Mena 1, 2, Diego Arenas 2, Miro Miranda 1, 2, and Andreas Dengel 1, 2 1 University of Kaiserslautern-Landau (RPTU), Kaiserslautern, Germany 2 German Research Center for Artificial Intelligence (DFKI), Kaiserslautern, Germany Abstract --In recent years, the development of robust multi-source models has emerged in the Earth Observation (EO) field. These are models that leverage data from diverse sources to improve predictive accuracy when there is missing data. Despite these advancements, the factors influencing the varying effectiveness of such models remain poorly understood. In this study, we evaluate the predictive performance of six state-of-the-art multi-source models in predicting scenarios where either a single data source is missing or only a single source is available. Our analysis reveals that the efficacy of these models is intricately tied to the nature of the task, the complementarity among data sources, and the model design. Surprisingly, we observe instances where the removal of certain data sources leads to improved predictive performance, challenging the assumption that incorporating all available data is always beneficial.
Cardiomyopathy Diagnosis Model from Endomyocardial Biopsy Specimens: Appropriate Feature Space and Class Boundary in Small Sample Size Data
Mori, Masaya, Omae, Yuto, Koyama, Yutaka, Hara, Kazuyuki, Toyotani, Jun, Okumura, Yasuo, Hao, Hiroyuki
As the number of patients with heart failure increases, machine learning (ML) has garnered attention in cardiomyopathy diagnosis, driven by the shortage of pathologists. However, endomyocardial biopsy specimens are often small sample size and require techniques such as feature extraction and dimensionality reduction. This study aims to determine whether texture features are effective for feature extraction in the pathological diagnosis of cardiomyopathy. Furthermore, model designs that contribute toward improving generalization performance are examined by applying feature selection (FS) and dimensional compression (DC) to several ML models. The obtained results were verified by visualizing the inter-class distribution differences and conducting statistical hypothesis testing based on texture features. Additionally, they were evaluated using predictive performance across different model designs with varying combinations of FS and DC (applied or not) and decision boundaries. The obtained results confirmed that texture features may be effective for the pathological diagnosis of cardiomyopathy. Moreover, when the ratio of features to the sample size is high, a multi-step process involving FS and DC improved the generalization performance, with the linear kernel support vector machine achieving the best results. This process was demonstrated to be potentially effective for models with reduced complexity, regardless of whether the decision boundaries were linear, curved, perpendicular, or parallel to the axes. These findings are expected to facilitate the development of an effective cardiomyopathy diagnostic model for its rapid adoption in medical practice.
Review for NeurIPS paper: A Causal View on Robustness of Neural Networks
Additional Feedback: Given fundamental limits of network robustness to adversarial attacks (see "Limitations of Adversarial Robustness: Strong No Free Lunch Theorem"), where does the proposed method differ, or relate to that general framework for robustness / adversaries? Does the causality framework provide a "way out" from the bounds and limits shown in that work? The lack of robustness to horizontal and vertical shift in the MNIST example seem as coupled to the architectural bias of the particular discriminator design, as to the task itself - for example an object detection framework such as RCNN or modern variants (ala Mask-RCNN) should have little issue with the shifted image task described in the paper. How can we separate the issue of network design (which is frequently driven by known invariances in the desired domain - such as moving from simple DNNs to more applicable CNNs) and the causal manipulation model (which also has design parameters and potential pitfalls, as discussed in 3.2 and 4.2). If using some kind of automated network design setting (such as meta-learning or evolutionary approaches) would both the CAMA model design, and the discriminator itself need to be designed in conjunction, or some kind of back-and-forth iteration?
Vertical LoRA: Dense Expectation-Maximization Interpretation of Transformers
In recent years, the field of machine learning, especially natural language processing (NLP), has witnessed a transformative evolution, primarily catalyzed by the advent of Transformer models and large language models. These models are known for their emergent ability to comprehend and generate human-like text. Specifically, Transformer models seem to undergo a transformative evolution with the growth of parameter count, achieving unprecedented performance across a spectrum of tasks, including text generation, machine translation, text summarization, question answering, and visual understanding. This finding leads to the trends in scaling up models up to millions and even Billions of parameters, exemplified by OpenAI's GPT[1, 2], Google's BERT[3], Meta's Llama[4], and Anthropic's Claude[5]. However, this scaling in model size simultaneously has rendered a significant barrier for ordinary individuals to train these models on consumer hardware setups.
- North America > United States > New York (0.04)
- Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
Digital Business Model Analysis Using a Large Language Model
Watanabe, Masahiro, Uchihira, Naoshi
Digital transformation (DX) has recently become a pressing issue for many companies as the latest digital technologies, such as artificial intelligence and the Internet of Things, can be easily utilized. However, devising new business models is not easy for compa-nies, though they can improve their operations through digital technologies. Thus, business model design support methods are needed by people who lack digital tech-nology expertise. In contrast, large language models (LLMs) represented by ChatGPT and natural language processing utilizing LLMs have been developed revolutionarily. A business model design support system that utilizes these technologies has great potential. However, research on this area is scant. Accordingly, this study proposes an LLM-based method for comparing and analyzing similar companies from different business do-mains as a first step toward business model design support utilizing LLMs. This method can support idea generation in digital business model design.
- Health & Medicine (0.40)
- Information Technology (0.36)
Novel Approaches for ML-Assisted Particle Track Reconstruction and Hit Clustering
Odyurt, Uraz, Dobreva, Nadezhda, Wolffs, Zef, Zhao, Yue, Sánchez, Antonio Ferrer, Bazan, Roberto Ruiz de Austri, Martín-Guerrero, José D., Varbanescu, Ana-Lucia, Caron, Sascha
Track reconstruction is a vital aspect of High-Energy Physics (HEP) and plays a critical role in major experiments. In this study, we delve into unexplored avenues for particle track reconstruction and hit clustering. Firstly, we enhance the algorithmic design effort by utilising a simplified simulator (REDVID) to generate training data that is specifically composed for simplicity. We demonstrate the effectiveness of this data in guiding the development of optimal network architectures. Additionally, we investigate the application of image segmentation networks for this task, exploring their potential for accurate track reconstruction. Moreover, we approach the task from a different perspective by treating it as a hit sequence to track sequence translation problem. Specifically, we explore the utilisation of Transformer architectures for tracking purposes. Our preliminary findings are covered in detail. By considering this novel approach, we aim to uncover new insights and potential advancements in track reconstruction. This research sheds light on previously unexplored methods and provides valuable insights for the field of particle track reconstruction and hit clustering in HEP.
- Europe > Spain > Valencian Community > Valencia Province > Valencia (0.04)
- North America > Cuba > Artemisa Province > Artemisa (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Research Report > Promising Solution (0.60)
- Overview > Innovation (0.60)
- Research Report > New Finding (0.54)
AI Fairness in Practice
Leslie, David, Rincon, Cami, Briggs, Morgan, Perini, Antonella, Jayadeva, Smera, Borda, Ann, Bennett, SJ, Burr, Christopher, Aitken, Mhairi, Katell, Michael, Fischer, Claudia, Wong, Janis, Garcia, Ismael Kherroubi
Reaching consensus on a commonly accepted definition of AI Fairness has long been a central challenge in AI ethics and governance. There is a broad spectrum of views across society on what the concept of fairness means and how it should best be put to practice. In this workbook, we tackle this challenge by exploring how a context-based and society-centred approach to understanding AI Fairness can help project teams better identify, mitigate, and manage the many ways that unfair bias and discrimination can crop up across the AI project workflow. We begin by exploring how, despite the plurality of understandings about the meaning of fairness, priorities of equality and non-discrimination have come to constitute the broadly accepted core of its application as a practical principle. We focus on how these priorities manifest in the form of equal protection from direct and indirect discrimination and from discriminatory harassment. These elements form ethical and legal criteria based upon which instances of unfair bias and discrimination can be identified and mitigated across the AI project workflow. We then take a deeper dive into how the different contexts of the AI project lifecycle give rise to different fairness concerns. This allows us to identify several types of AI Fairness (Data Fairness, Application Fairness, Model Design and Development Fairness, Metric-Based Fairness, System Implementation Fairness, and Ecosystem Fairness) that form the basis of a multi-lens approach to bias identification, mitigation, and management. Building on this, we discuss how to put the principle of AI Fairness into practice across the AI project workflow through Bias Self-Assessment and Bias Risk Management as well as through the documentation of metric-based fairness criteria in a Fairness Position Statement.
- Europe > United Kingdom > Wales (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (9 more...)
- Workflow (1.00)
- Research Report > Experimental Study (1.00)
- Instructional Material > Course Syllabus & Notes (0.67)
- Law > Civil Rights & Constitutional Law (1.00)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- (10 more...)
Leveraging Open Information Extraction for Improving Few-Shot Trigger Detection Domain Transfer
Dukić, David, Gashteovski, Kiril, Glavaš, Goran, Šnajder, Jan
Event detection is a crucial information extraction task in many domains, such as Wikipedia or news. The task typically relies on trigger detection (TD) -- identifying token spans in the text that evoke specific events. While the notion of triggers should ideally be universal across domains, domain transfer for TD from high- to low-resource domains results in significant performance drops. We address the problem of negative transfer for TD by coupling triggers between domains using subject-object relations obtained from a rule-based open information extraction (OIE) system. We demonstrate that relations injected through multi-task training can act as mediators between triggers in different domains, enhancing zero- and few-shot TD domain transfer and reducing negative transfer, in particular when transferring from a high-resource source Wikipedia domain to a low-resource target news domain. Additionally, we combine the extracted relations with masked language modeling on the target domain and obtain further TD performance gains. Finally, we demonstrate that the results are robust to the choice of the OIE system.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- (10 more...)
A Field Guide to Scientific XAI: Transparent and Interpretable Deep Learning for Bioinformatics Research
Quinn, Thomas P, Gupta, Sunil, Venkatesh, Svetha, Le, Vuong
Deep learning has become popular because of its potential to achieve high accuracy in prediction tasks. However, accuracy is not always the only goal of statistical modelling, especially for models developed as part of scientific research. Rather, many scientific models are developed to facilitate scientific discovery, by which we mean to abstract a human-understandable representation of the natural world. Unfortunately, the opacity of deep neural networks limit their role in scientific discovery, creating a new demand for models that are transparently interpretable. This article is a field guide to transparent model design. It provides a taxonomy of transparent model design concepts, a practical workflow for putting design concepts into practice, and a general template for reporting design choices. We hope this field guide will help researchers more effectively design transparently interpretable models, and thus enable them to use deep learning for scientific discovery.
- Oceania > Australia (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Montenegro (0.04)